Back

Computational Biology and Chemistry

Elsevier BV

Preprints posted in the last 90 days, ranked by how well they match Computational Biology and Chemistry's content profile, based on 23 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.

1
Reveal Principles of Codon Optimization via Machine Learning

Deng, F.; Li, H.; Sun, D.; Duan, G.; Sun, Z.; Xue, G.

2026-04-21 bioinformatics 10.64898/2026.04.16.718958 medRxiv
Top 0.1%
6.4%
Show abstract

High level of protein expression is usually welcomed in industry and research, and codon optimization is widely used to achieve high expression. Methods of implementing codon optimization can be divided into two branches, one is classical methods which develop cost functions based on empirical law, another is AI methods which learn the codon choice principles from endogenous genes with neural networks. Here we develop two codon optimization tools based on two branches respectively, namely OptimWiz 2.1 and OptimWiz 3.0. Results of fusion protein fluorescence detection indicate that both OptimWiz 2.1 and OptimWiz 3.0 are superior to all the other commercially available codon optimization tools. Principles of codon optimization are revealed in the process of machine learning on both tools.

2
Minimal Amino Acid Alphabet for Protein Design

Pubal, K.; Kushnir, K.; Spiwok, V.; Louzecka, K.; Setnicka, V.; Lipovova, P.

2026-03-06 bioinformatics 10.64898/2026.03.06.710107 medRxiv
Top 0.1%
4.1%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWProteins are built from 20 canonical amino acids. It is interesting to explore whether proteins can be formed from significantly reduced amino acid alphabets. Our bioinformatics survey of UniProt (more than 250 M sequences) revealed that proteins composed of reduced amino acid alphabets (< 10) are extremely rare among existing proteins. Next, we used computational protein design to design proteins composed of all 1,013 possible alphabets of 2-10 early amino acids (Ala, Asp, Glu, Gly, Ile, Leu, Pro, Ser, Thr, and Val). The length of all proteins was 100 amino acid residues. Small amino acid alphabets preferred simple helices or helix bundles. Larger amino acid alphabets allowed for the design of more complex structures. A protein composed of 8 amino acids (Ala, Asp, Gly, Leu, Val, Ser, Thr, and Pro) was successfully experimentally verified. It belongs to fibronectin type III domain {beta}-sheet-rich architecture. Attempts to experimentally verify designs composed of 6 and 4 amino acids were unsuccessful. We show by a computational experiment without an experimental validation that inverse folding programs, namely ProteinMPNN, can stabilize designed proteins within the same amino acid alphabet. Our results show that globular proteins may have formed early in evolution. Furthermore, we show that it is possible to design proteins with interesting properties for biotechnology and synthetic biology.

3
RNA Folding Nearest Neighbor Parameters Including the Modification 1-Methyl-Pseudouridine

Kierzek, E.; Shabangu, T. S.; Hiltke, O. M.; Miaro, M.; Arteaga, S.; Znosko, B. M.; Jolley, E. A.; Bevilacqua, P. C.; SantaLucia, J.; SantaLucia, H. A.; Lin, H.; Metkar, M.; Aviran, S.; Soszynska-Jozwiak, M.; Kierzek, R.; Mathews, D. H.

2026-04-11 bioinformatics 10.64898/2026.04.09.717343 medRxiv
Top 0.1%
1.9%
Show abstract

Nearest neighbor analysis is commonly used to estimate RNA folding stabilities. In this contribution, we report a set of RNA folding nearest neighbor parameters for estimating free energy change for RNA sequences including 1-methyl-pseudouridine. Development of mRNA vaccines has identified 1-methyl-pseudouridine as a key nucleobase modification for suppressing innate immune responses. However, the contributions of these modifications to RNA folding stability were unclear. Our new parameters provide helical terms for 1-methyl-pseudouridine-adenine and 1-methyl-pseudouridine-guanine base pairs. The parameters also estimate loop stabilities for loops with 1-methyl-pseudouridine or a combination of 1-methyl-pseudouridine and uridine. These parameters are derived using 208 optical melting experiments and tested against an additional 16 optical melting experiments. On average, we find that substitution of uridine with 1-methyl-pseudouridine stabilizes RNA folding, with the extent of stabilization depending on adjacent sequence. The estimation of tRNA folding ensembles for tRNA sequences with 1-methyl-pseudouridine was significantly improved using the new nearest neighbor parameters. The new nearest neighbor parameters are provided as part of the RNAstructure software package. With these parameters, the secondary structures of natural sequences with 1-methyl-pseudouridine and mRNA therapeutics fully substituted with 1-methyl-pseudouridine can be modeled.

4
Physics-Guided Deep Neural Networks: Correcting Physical Distortions in Protein Phase Separation Prediction

Wang, M.; Lu, T.; Song, Y.-h.; Li, y.

2026-04-21 cell biology 10.64898/2026.04.18.719364 medRxiv
Top 0.2%
1.5%
Show abstract

BackgroundIn computational biology, embedding known physical laws into deep learning models to construct "Physics-Informed Neural Networks" (PINNs) is a mainstream paradigm for enhancing model interpretability and extrapolation capability. However, in complex multi-physics coupling problems, there is a risk of competitive imbalance between the physical term and the flexible artificial intelligence (AI) residual term, causing the model to degenerate into a "black-box" fit and lose the original purpose of being physics-driven. MethodsIn this study, targeting the problem of predicting protein liquid-liquid phase separation (LLPS) behavior in response to environmental factors (temperature, salt concentration), we identified physical distortions, gradient vanishing, and numerical instability in the initial physics-AI hybrid model. Three core correction strategies were proposed: (1) Weight Allocation Logic Reconstruction: Force the physical trunk weight to 1.0 at the output layer, suppressing the AI residual term to the perturbation level of 0.05~0.1, ensuring physics dominance; (2) Robust Physics Formula Construction: Abandon the unstable power function and introduce a combination of Softplus and logarithmic functions to stably simulate the nonlinear effects of charge shielding; (3) Gain Compensation Alignment: Apply gain compensation to the weak signal branch (temperature) to ensure its effective participation in optimization. ResultsThe optimized model maintained a fitting accuracy of R2{approx}0.62 on the test set, while physical consistency was significantly enhanced. The model successfully restored the monotonic increase in solubility with temperature characteristic of UCST-type phase diagrams and correctly captured the nonlinear charge shielding features in the salt concentration response. The weights of key physical parameters (e.g., hydrophobic contribution w_h, net charge contribution w_ncpr) increased from <10-3 to the 10-2 magnitude, demonstrating the reactivation of the physical branch. ConclusionsThe weight control, formula stabilization, and signal gain alignment strategies proposed in this study effectively address the classic problem of "AI hijacking" physics in physics-AI hybrid models. This work provides a universal solution for constructing biophysical predictive models that combine high fitting accuracy with strong physical interpretability.

5
Graph Neural Networks (GNNs) for Protein-Ligand Interaction Prediction

Khilar, S.; Natarajan, E.

2026-04-24 bioinformatics 10.64898/2026.04.23.720519 medRxiv
Top 0.2%
1.5%
Show abstract

Predicting protein-ligand interactions in the modern drug discovery has revolved from the involvement of artificial intelligence and structural bioinformatics using Graph Neural Networks (GNNs). The limited explainability of GNN models presents an important encumbrance in biomedical research, but it has achieved a high degree of accuracy in determining and identifying binding affinity and active compounds, as evidenced by [1] [2] [3] [4]. Here this research focuses on the interpretation of protein-ligand interactions at a molecular level, a rapidly developing area within Graph Neural Networks (GNNs). Now days modern study handling techniques such as visualization techniques, attention mechanism and model-based feature ascription by model to boost, and make robust and decrease false predictions on binding. Along with some approaches include like graph pooling strategies, message-passing optimization, self-supervised learning, transfer learning and contrastive learning are rapidly utilized to enhance the representative learnings. Furthermore, integration of molecular docking simulations, hybrid deep learning architectures and protein language model gives more reliable & biological predictions of protein-ligand interactions. That focuses on given process that identifies key ligand atoms and binding residues, as well as physicochemical factors influencing affinity, through chemical thought processes. Here this research work identified the challenges of developing biologically significant explanations, transparency, and the corollary dataset biases on interpretability. The research work conducted an in-depth investigation into the consolidation of protein language models to establish more reliable pathways for future research, examining hybrid architectures, transparent and energy-efficient GNNs, and scientifically grounded AI models for drug discovery. My research work highlights that XGNNs establishes a connection between Deep Learning and Biochemical expertise with increased confidence, which will enhance the accuracy of predictive models and computational models.

6
Protein Language Modeling and Evolutionary Analysis Reveal an N-terminal Determinant of Functional Divergence in Cytochrome P450s from Sophora. tonkinensis

Qiao, Z.; Wang, J.; Qin, B.; Wei, F.; Liang, Y.

2026-03-07 plant biology 10.64898/2026.03.06.710024 medRxiv
Top 0.2%
1.4%
Show abstract

O_LIThe N-terminal signal sequences of plant cytochrome P450 enzymes are recognized as critical determinants for subcellular localization and functional diversification, yet their evolutionary drivers and mechanisms remain largely unresolved. C_LIO_LIIn this study, the evolutionary trajectories of these signals were systematically decoded through the integration of the protein language model ESM-2 with phylogenetic and selection analyses. A conserved functional fingerprint was identified. This region may serve as the essential endoplasmic reticulum targeting signal and be evolutionarily decoupled from adjacent surfaces under positive selection during lineage-specific expansions. C_LIO_LIA functional-adaptive decoupling model is proposed to explain this pattern, wherein a conserved functional core is maintained while surrounding interfaces diversify. This evolutionary architecture is interpreted as the outcome of a two-step cycle: an initial phase of positive selection driving functional innovation, followed by pervasive neutral evolution that facilitates structural exploration and potentiates future adaptations. C_LIO_LIThis work demonstrates how interpretable machine learning can be integrated with evolutionary theory to reconcile neutralist and selectionist perspectives on protein evolution. A novel framework is thus provided for understanding the layered evolution of protein modules, where structural constraint, adaptive innovation, and neutral drift operate on distinct tiers to generate functional diversity. C_LI

7
Combining amino acid frequency and 1D convolutional neural network embeddings for the identification of protein-protein interactions using a random forest classifier

Sindhi, N. A.; Pawar, N.; Dixson, J.; Garcia, D.

2026-05-18 bioinformatics 10.64898/2026.05.15.725340 medRxiv
Top 0.2%
1.3%
Show abstract

Predicting protein-protein interactions is a fundamental problem in molecular biology. Experimental approaches for identifying protein-protein interactions are time-consuming and labor-intensive, motivating the development of efficient computational alternatives, including machine learning-based methods. However, conventional machine learning methods often rely on manually engineered features that require substantial domain expertise. In this study, we propose a two-stage framework to address these limitations. In the first stage, a one-dimensional convolutional neural network autoencoder is used to automatically learn latent representations from protein sequences. The quality of these features is evaluated through reconstruction error, reflecting how accurately the model reconstructs the original sequence. In the second stage, these learned features are combined with amino acid frequency-based features to form a hybrid feature set for predicting protein-protein interactions. A systematic comparison is performed between models trained on frequency features alone and those using a hybrid representation. The comparison showed that incorporating one-dimensional convolutional neural network-derived latent features improved the models performance of predicting protein-protein interactions. The dataset was split into training, validation, and test sets. Nested cross-validation was employed, with inner loops for hyperparameter tuning and outer loops for model selection. The random forest classifier achieved the best performance, with a mean receiver operating characteristic-area under curve of 0.91 and a test F1-score of 0.87. These results highlight the effectiveness of integrating deep feature learning with ensemble methods for predicting protein-protein interactions and build upon previous work focused on this fundamental problem. Author SummaryProtein-protein interactions are fundamental in all biological processes. However, predicting these interactions is a key problem in molecular biology. Computational approaches have been tested to address this problem. We applied a mix of machine learning and deep learning to gain insight into the qualities of proteins that engage in interaction. First, we trained a deep learning model, which automatically learned the primary sequence and characters related thereto, reducing bias in the actual prediction process. We combined these features, or latent representations, with amino acid frequency features of protein sequences, and called the two together "hybrid features." Then we performed a systematic comparison of amino acid frequency features-only with hybrid features, among four different machine learning classifiers. Our results suggest that the random forest classifier performed best among all four classifiers at predicting interactions between proteins. We propose that this approach could be used to improve efficiency in testing protein-protein interactions at the bench and may have applications to other biologically relevant molecular interactions.

8
Enzymatic and Biophysical Analysis of two Highly Related Cytochrome P450 Reductases from Artemisia annua Reveals Differences in Their Ligand Interactions and Domain Motions

Mostert, B.; Judd, R.; Makris, T.; Xie, D.

2026-05-17 plant biology 10.64898/2026.05.13.725038 medRxiv
Top 0.2%
1.2%
Show abstract

Artemisinin is an effective antimalarial drug sourced from Artemisia annua, but its low and variable yields require enhancement either semi-synthetically or in-planta to meet the global demand for treatment. Though essential enzymes have been identified in the artemisinin biosynthetic pathway, including an essential Cytochrome P450 monooxygenase (CYP71AV1), there are still many unknowns. Cytochrome P450 reductase 1 (herein, AaCPR1), has been experimentally confirmed as an electron transfer partner for CYP71AV1 in its three step oxygenation of key artemisinin precursors. However, the recent discovery of a highly related CPR, herein AaCPR2, introduces the possibility that another, potentially more catalytically favourable interaction, could exist for CYP71AV1. Therefore, enzyme kinetics and differential scanning fluorimetry (DSF) were used in the characterisation of both AaCPR1 and AaCPR2 to determine the existence and source of their catalytic differences. Tested enzyme activity under cytochrome c and NADPH concentrations revealed that AaCPR1 had lower Km and higher kcat/Km values, while AaCPR2 had higher Vmax and kcat values. This suggests that AaCPR1 is more effective at reducing cytochrome c when substrate conditions are limiting, whereas AaCPR2 is more effective than AaCPR1 at reducing cytochrome c when substrate conditions are saturating. This implies a functional partitioning of the two enzymes on the basis of substrate availability. The DSF results provided deeper insight into the different protein-ligand interactions between the two enzymes. AaCPR2 reached lower maximum melting temperatures across all tested conditions, whereas AaCPR1 had higher maximum melting temperatures. Thus, AaCPR1 exhibits higher thermal stability and has a higher temperature threshold than AaCPR2. This contributes to the notion that the AaCPRs are functionally divergent also on the basis of temperature. The cumulative differences in melting behaviour between the two enzymes led to the hypothesis that AaCPR1 and AaCPR2 exhibit different domain motions that may lead to preferential catalysis for one redox partner over another. This was further supported by the prediction of a highly variable loop region between the two enzymes at the connecting domain just after the flexible hinge. If such loops are highly mobile, as predicted, then the residue differences therein could provide a bio-structural basis for the kinetic and thermal/biophysical differences observed between AaCPR1 and AaCPR2. These data support that AaCPR1 and AaCPR2 possess fundamental biophysical differences despite their high degree of relatedness. Ultimately, these differences suggest differential metabolic functions of the two enzyme in artemisinin biosynthesis and/or other important secondary metabolic processes.

9
Learning from Drops: AI-Guided Integration of Liquid Biopsy Features in Cancer Studies

Andueza, M.; Villoslada-Blanco, P.; De Dreuille, B.; Alonso, L.; Sabroso-Lasa, S.; Pantel, K.; Alix-Panabieres, C.; Lopez de Maturana, E.; Malats, N.

2026-05-17 bioinformatics 10.64898/2026.05.12.724535 medRxiv
Top 0.2%
1.2%
Show abstract

Cancer is a major global health issue with rising incidence and mortality. Early detection, tumor characterization, and disease surveillance are crucial for timely and effective treatment, ultimately reducing mortality rates. Liquid biopsy (LB) has emerged as a valuable detection tool offering a non-invasive method to determine tumor-derived biomarkers in body fluids with demonstrated translational potential. To increase biomarker sensitivity, high-throughput sequencing platforms deliver massive volumes of data. Artificial Intelligence (AI) is pivotal in enabling huge and complex data integration. This contribution aims to assess the current state of integrative AI-based research in the LB field and provide methodological guidance. First, we conducted a PubMed search and found that the literature is sparse in studies integrating LB features, particularly by applying AI. When adopting the latter approach, defining the study objectives is crucial to guide the subsequent methodological aspects, including study design, patient selection criteria, sample size, nature of the LB features, and metadata to collect. Specifically, we propose strategies and tools for data preprocessing, including normalization and batch correction, as well as handling outliers and missing data. Furthermore, we recommend various Machine/Deep Learning approaches for feature selection techniques to ensure model robustness, and we highlight the importance of undergoing rigorous internal and external validations of the selected models. Assessing clinical utility and interpretability is often overlooked but fundamental for real-world implementation. In conclusion, we provide the LB scientific community with an AI-based methodological guidance to bridge the two fields and enhance the integrative analysis of LB features. Graphical abstractWorkchart for multiomics integrative studies in the liquid biopsy field. Note: CTCs, circulating tumor cells; ctDNA, circulating tumor-DNA; TEPs, tumor-educated platelets; miRNA, microRNA; cfRNAs, cell-free RNAs. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=159 SRC="FIGDIR/small/724535v1_ufig1.gif" ALT="Figure 1"> View larger version (45K): org.highwire.dtl.DTLVardef@1f250b2org.highwire.dtl.DTLVardef@18fe36corg.highwire.dtl.DTLVardef@19c02b9org.highwire.dtl.DTLVardef@176f6e0_HPS_FORMAT_FIGEXP M_FIG C_FIG

10
Single-nucleus multiome sequencing identifies candidate regulators of mouse gastric epithelial homeostasis

Monteiro de Barros, M. R.; Bosch, K.; Soualhi, S.; Issa Bhaloo, S.; Chu, T.; Hemrajani, T.; Cho, J.; Ozuner, K.; Fu, R.; Geiger, H.; Robine, N.; Carter, J. E. B.; Maniatis, S.; Ryeom, S.; Tavare, S.; Nowicki-Osuch, K.

2026-04-27 genomics 10.64898/2026.04.23.720450 medRxiv
Top 0.3%
0.9%
Show abstract

Background & AimsGastric epithelial cells maintain homeostasis through dynamic self-renewal mechanisms involving stem and progenitor cells; however, identifying them has been challenging. This study aims to identify stem cells of healthy gastric epithelium and cell type-specific regulators defining gastric epithelial homeostasis via single-nucleus multiome analysis. MethodsTen unique gastric samples were collected from 8-12 week old wildtype mice. Isolated nuclei were subjected to simultaneous profiling of gene expression and chromatin accessibility. After quality control, 31,598 cells were analyzed with Seurat and Signac using weighted-nearest neighbors analysis for joint RNA and ATAC clustering. Furthermore, SCENIC+, MultiVelo, EpiCHAOS and Cell plasticity score were used to uncover gene regulatory networks, cell state dynamics and lineage trajectories. ResultsOur analyses were validated by the identification of known regulators of stem-cell differentiation into mature cell types. More importantly, it revealed previously uncharacterized regulatory networks comprising novel transcription factor combinations that define cell identities, including Ppara, Pparg, Arid5b and Sox5 as candidate regulators of parietal, foveolar, chief and neck cells, respectively. Further, our data support the identity of isthmus cells as stem-like cells of healthy gastric epithelium, as evidenced by epigenetic plasticity that simultaneously contains open chromatin states of all differentiated cell types in the absence of transcriptional reprogramming. ConclusionConsistent with Waddingtons epigenetic landscape hypothesis, gastric epithelial homeostasis is controlled by orchestrated epigenetic and transcriptional programs. Contrary to the prevailing hypothesis, stem cells can be defined not by a separate epigenetic state but by epigenetic superposition of differentiated cell states. Future work is needed to define the universality of these results.

11
Genome-wide computational prediction of miRNAs encoded by influenza A virus (H3N2) predicts target genes involved in pulmonary and antiviral innate immunity

Siddiqi, M. A.; Kumar, H.; Mazumder, M.

2026-05-18 bioinformatics 10.64898/2026.05.18.725090 medRxiv
Top 0.3%
0.9%
Show abstract

Influenza A virus (IAV) causes significant morbidity and mortality worldwide. Understanding how viral RNAs may regulate host genes through microRNA-like mechanisms can clarify pathogenesis and reveal therapeutic targets. In this study, we screened all eight IAV H3N2 RNA segments (PB2, PB1, PA, HA, NP, NA, M, and NS) using an ab initio computational pipeline; five segments (PB2, PB1, PA, HA, and M) met the VMir scoring threshold for further analysis, while NP, NA, and NS were excluded due to low pre-miRNA scores. Mature miRNAs were identified using MatureBayes, and target genes in the human genome were predicted with the miRDB server. From these targets, we selected two genes per qualifying segment (10 genes total) based on their functional relevance to influenza infection and supporting literature; all selected genes are unique to their respective segment. We identified 10 segment-specific target genes (IFNL1, DDX60, SAMHD1, MAVS, IRF4, BIRC2, AGO1, MAP3K1, NOD1, and TNFAIP1) and one common target across all five analyzed segments (CADM2). Gene Ontology and pathway analyses showed enrichment in interferon signaling, RIG-I-like receptor pathways, antiviral restriction, RNA interference, and inflammatory responses. Literature supports roles for these genes in pulmonary and antiviral innate immunity. Our findings provide a basis for experimental validation and may help the research community better understand influenza virus pathogenesis and identify novel therapeutic candidates. GRAPHICAL ABSTRACT O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=111 SRC="FIGDIR/small/725090v1_ufig1.gif" ALT="Figure 1"> View larger version (33K): org.highwire.dtl.DTLVardef@2b14adorg.highwire.dtl.DTLVardef@5a9b2eorg.highwire.dtl.DTLVardef@81ffc1org.highwire.dtl.DTLVardef@be119b_HPS_FORMAT_FIGEXP M_FIG C_FIG

12
Artificial intelligence aided design of peptides with custom secondary structure motifs and reduced amino acid alphabets

Brown, S. M.; Cohen, A. B.; Dean, S. N.

2026-05-01 bioinformatics 10.64898/2026.04.29.721096 medRxiv
Top 0.3%
0.9%
Show abstract

Proteins are highly diverse functional polymers where the specific sequence of amino acids, selected from a standard genetically-encoded alphabet of twenty (C20), determines the structure and ultimately the function of the resulting folded protein. This standard alphabet has been identified to be non-randomly distributed in physicochemical properties crucial to both structure-formation and function, often referred to as coverage theory. While machine learning models have drastically improved protein structure prediction, protein design has yet to have similar development. Here we therefore bridge contemporary biological theory with recent advancements in artificial intelligence (AI) to develop and evaluate a generative AI protein design model, trained on hundreds of thousands of proteins within the RSCB PDB, for custom secondary structure motifs using reduced amino acid alphabets. Results indicate an overall success in designing novel proteins with desired secondary structure motifs for a broad range of amino acid alphabets. Interestingly this tool often captures the full three-dimensional tertiary structure of a target protein despite training only on physicochemical sequence space and DSSP secondary structure. The development of this model advances research across multiple disciplines, from general scientific AI/ML architecture development to protein design for biotechnology, astrobiology, and early-Earth evolutionary biology.

13
Alternative polyadenylation and the sex-specific gene expression program in hemp

Shivakumar, A.; Hunt, A. G.; Chakrabarti, M.

2026-05-17 plant biology 10.64898/2026.05.13.725035 medRxiv
Top 0.4%
0.8%
Show abstract

Hemp (Cannabis sativa) produces a wide array of medicinally significant compounds, including cannabidiol (CBD). These compounds are predominantly synthesized in female hemp inflorescences. The proposed research utilizes next-generation sequencing-based transcriptome analysis using a 3{square}-end-directed approach to identify differentially expressed genes between male and female hemp plants at the early vegetative stage. 886 differentially expressed genes (DEGs) were identified, a majority of which were upregulated in males compared to females. We hypothesized that alternative RNA processing contributes to sex-specific gene expression. To this end, 932 genes were identified that exhibited significant changes in poly(A) site usage when comparing males and females. These genes were much more likely to be differentially expressed, supportive of this hypothesis. Males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. This suggests that polyadenylation remodels hemp mRNAs with distal poly(A) sites being preferred in males. To further investigate when this sex-specific gene expression program is established, RNA was isolated from plants at various developmental stages, such as developing seeds, four-day-old seedlings, and different developmental stages up to four weeks after sowing. Diagnostic male-specific genes were analyzed using RT/PCR. The results indicate that sex-specific gene expression is not evident in seeds but rather is set during or after germination. SignificanceO_LIHemp males tend to have longer 3 UTRs with canonical motifs found in the Near-Upstream Elements (NUE), compared to the shorter 3 UTRs in females, which have A-rich motifs near the cleavage site. C_LIO_LIThe sex-specific gene expression program is not yet established in mature seed but is set in the time between germination and 4 days of growth. C_LI

14
Labour Induction in low-risk women at 39 weeks of gestation: a Randomised trial in China (LIRIC) - Protocol of an open label, randomised controlled trial

Gao, H.; Shen, J.; Chen, D.; Mol, B. W.; Hun, W.; Liang, Z.; Bai, X.; Han, X.; Zhu, J.; Wang, H.; Liu, X.; Su, C.; Weng, R.; Liu, Y.; Li, W.; Zhang, D.

2026-05-26 obstetrics and gynecology 10.64898/2026.05.24.26354001 medRxiv
Top 0.4%
0.8%
Show abstract

Abstract Introduction The ARRIVE trial first demonstrated that elective induction of labour (IOL) at 39 weeks in low-risk pregnancies reduced the likelihood of caesarean section (CS) without compromising perinatal safety; however, the generalizability of these findings remains debated, leading to uncertainty in clinical practice. The LIRIC trial aims to evaluate whether 39-week elective IOL reduces CS rates compared with expectant management, while exploring its impact on infant neurodevelopment and multi-omics profiles. Methods and analysis This is a single-centre, open-label, randomized controlled trial in China. A total of 1,074 low-risk pregnant women (nulliparous or multiparous) will be randomly assigned (1:1 ratio) to either 39-week IOL or expectant management. The primary outcome is the caesarean section (CS) rate. Secondary outcomes include a composite of severe neonatal morbidity and perinatal mortality and infant neurodevelopmental scores (Bayley-4 and ASQ-3), among others. Data analysis will follow the Intention-to-Treat (ITT) principle. Biospecimen will be collected for metagenomic and metabolomic analyses, with results to be reported separately. Ethics and dissemination The protocol has been approved by the Ethics Committee of Women's Hospital, School of Medicine, Zhejiang University. Informed consent will be obtained from all participants. Results will be disseminated via peer-reviewed journals, and standardized infant developmental reports will be provided to participants to enhance study benefit. Trial registration number NCT07082530.

15
Human Histone Fragments Display Antibacterial Properties against Pseudomonas aeruginosa

Jaber, N.; Di Somma, A.; Rodriguez-alfonso, A. A.; Cane, C.; Read, C.; Ständker, L.; Wiese, S.; Duilio, A.; Münch, J.; Spellerberg, B.

2026-05-11 microbiology 10.64898/2026.05.11.724237 medRxiv
Top 0.4%
0.8%
Show abstract

BackgroundRising antimicrobial resistance rates, require new therapeutic approaches such as antimicrobial peptides (AMPs), which are part of the innate immune defense, as alternatives to antibiotics. In this study, we aim to unravel the antibacterial activity of human histone H1.2 peptide against Pseudomonas aeruginosa and its potential immune modulatory role. MethodsWe used a hemofiltrate peptide database for antimicrobial peptide prediction to identify novel human AMPs. Thirteen sequences of histone H1 were identified as putative AMPs, synthesized, and tested against bacterial ESKAPE pathogens in a radial diffusion assay. SYTOX green assay, electrophoretic mobility shift assay, and differential proteomics assays were conducted to determine the mode of action of H1.2 peptide fragment. A crystal violet assay was performed to evaluate the inhibition of biofilm formation. The cytotoxicity of the peptide was tested in LDH and Alamar assays. Finally, to visualize the contributions of H1.2 in NETs formation, scanning electron microscopy was performed. ResultsThe H1.2 peptide inhibited the growth of P. aeruginosa in a dose and pH-dependent manner without cytotoxicity towards mammalian THP-1 cells. It acts on intracellular targets to inhibit the growth of P. aeruginosa. STRING analysis from the differential proteomics assay showed that H1.2 targets the downregulation of proteins involved in the biogenesis of outer membrane proteins, including the folding and trafficking of outer membrane proteins across the cytoplasmic membrane. Scanning electron microscopy images showed that H1.2 forms NET-like structures capable of trapping and immobilizing P. aeruginosa. ConclusionThe characterized antimicrobial activity of H1.2 points to a role for human histone H1 fragments in innate immunity and may represent a promising approach for the development of novel antibacterial therapies. Graphical Summary O_FIG O_LINKSMALLFIG WIDTH=192 HEIGHT=200 SRC="FIGDIR/small/724237v1_ufig1.gif" ALT="Figure 1"> View larger version (36K): org.highwire.dtl.DTLVardef@1778ddborg.highwire.dtl.DTLVardef@26430org.highwire.dtl.DTLVardef@ffbfa2org.highwire.dtl.DTLVardef@7e38ae_HPS_FORMAT_FIGEXP M_FIG C_FIG Sec transport and BAM complex system including chaperone proteins and quality control proteases are inhibited by H1.2 in Pseudomonas aeruginosa.Outer membrane proteins (OMPs) are synthesized in the cytoplasm and transported across the inner membrane via the Sec translocase, assisted by SecA/SecB or ribosomes. In the periplasm, they are escorted by chaperones such as SurA to the BAM complex for insertion into the outer membrane. Here, we show that H1.2, an antimicrobial peptide, targets membrane biogenesis in P. aeruginosa through downregulating Sec translocase (SecA/SecB and SecYEG), SurA, and BAM complex. Therefore, leading to improper transfer, folding and insertion of OMPs into the outer membrane. Normally, misfolded proteins are degraded by the protease MucD to prevent toxic aggregation in the bacteria. However, with H1.2 inhibiting MucD the proteotoxic stress is exacerbated, ultimately compromising bacterial homeostasis and viability. Figure created using BioRender.com.

16
Housekeeping Gene Expression Normalization in Transcriptomics Mitigates Data Leakage in Machine Learning Models

Ribas, G. T.; Riella, C. V.; Guizelini, D.; Menegatti Rigo, M.; Riella, L. V.; Borges, T. J.

2026-04-24 bioinformatics 10.64898/2026.04.24.720637 medRxiv
Top 0.4%
0.8%
Show abstract

BackgroundInappropriate normalization can lead to data leakage and overfitting in machine learning models. Accurately identifying housekeeping genes (HKGs) can overcome this problem and is crucial for normalizing gene expression data, particularly in RNA-Seq experiments. ResultsFirst, we demonstrate that the gene expression of commonly used HKGs significantly changes over time due to immunosuppressive treatments in transplant recipients. Using large public transcriptomic datasets of kidney transplantation, we developed a pipeline based on the genes coefficient of variation, stability, and Gini coefficient, and identified nine stable and better-suitable HKG candidates. Our results demonstrate that these HKGs improve the robustness and generalizability of machine learning models by minimizing data leakage, as evidenced by superior performance compared to benchmark methods like median ratio normalization and trimmed mean of M values. ConclusionsThis approach enables more accurate comparison of gene expression datasets across different clinical scenarios, improving the reliability of biomarker identification and enhancing personalized treatment strategies.

17
Inhibition of miR-1307 Reverses Resistance to Cisplatin in Drug-Resistant Oral Squamous Cell Carcinoma

Patel, A.; Patel, V.; Lotia, S.; Patel, K.; Mandlik, D.; Tan, J.; Sampath, P.; Patel, B.; Johar, K.; Bhatia, D. D.; Tanavde, V.; Patel, S.

2026-04-09 cancer biology 10.64898/2026.04.06.709730 medRxiv
Top 0.4%
0.8%
Show abstract

BackgroundChemo-resistance remains a major clinical challenge in Oral Squamous Cell Carcinoma (OSCC), attributed to the intrinsically resistant cells. Although tumour-derived extracellular vesicles (EVs) have been implicated in cell-cell communication, their role in propagating chemo-resistance remains poorly defined. This study aims to identify salivary EV-associated miRNAs capable of predicting chemoresistance and to delineate the role of miR-1307-5p in modulating CSC-driven therapeutic refractoriness. MethodsSalivary EV-derived expression profile of miR-1307-5p was assessed by qPCR in chemo resistant OSCC patients and further validated in TCGA small RNA sequencing datasets. Expression was validated by qPCR and correlated with clinicopathological outcomes. Functional assays including cell-cycle analysis, apoptosis, migration/invasion, 3D spheroids, angiogenesis, and CAM assays were performed in miR-1307-5p inhibited CD44 CSC subpopulation compared to its vehicular control. Transcriptomic profiling cross-referencing with TCGA was conducted to identify potential novel targets of miR-1307-5p. Chemo-sensitisation was assessed by treating the knockdown chemo resistant cells with low dose cisplatin and validating it using in-vitro functional assays and orthotopic xenograft model. ResultsmiR-1307-5p was significantly elevated in salivary EVs of chemo resistant OSCC patients and correlated with poor overall survival (p = 0.03). The miRNA was markedly enriched in endogenously resistant CD44 CSCs. Silencing of miR-1307-5p induced G2/M arrest, triggered apoptosis, impaired invasion, and reduced angiogenesis both in-vitro and in ex-vivo assays. Transcriptomic profiling, TCGA validation, and integrative pathway analysis identified key oncogenic hubs which converge on PI3K-AKT, MAPK/ERK, and YAP signalling pathways governing EMT. Inhibition of miR-1307-5p restored cisplatin sensitivity in resistant CSCs, with low-dose cisplatin producing substantial tumour suppression in-vitro and in-vivo. Reduced CD44 expression in xenograft models confirmed CSC reprogramming. EVs from anti-miR-treated cells confer chemo sensitisation upon uptake by resistant CSCs. Xenograft models substantiated that EVs can initiate tumour formation and that EV-mediated delivery of anti-miR-1307-5p drives significant tumour regression. ConclusionThis study identifies salivary EV-derived miR-1307-5p as a clinically relevant biomarker of chemoresistance in OSCC and reveals its mechanistic role in sustaining CSC-driven therapeutic failure. Targeting miR-1307-5p offers a promising avenue for restoring cisplatin sensitivity and developing exosome-based therapeutic strategies. Graphical Abstract O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=150 SRC="FIGDIR/small/709730v1_ufig1.gif" ALT="Figure 1"> View larger version (38K): org.highwire.dtl.DTLVardef@19f88e0org.highwire.dtl.DTLVardef@d36b95org.highwire.dtl.DTLVardef@3c2579org.highwire.dtl.DTLVardef@c04ef5_HPS_FORMAT_FIGEXP M_FIG C_FIG

18
AI-Driven Reconstruction of the Research Paradigm for Phase Separation in Membraneless Organelle

ding, y.; lu, t.; Li, y.

2026-04-02 cell biology 10.64898/2026.03.31.715491 medRxiv
Top 0.5%
0.8%
Show abstract

Liquid-liquid phase separation (LLPS) of biomacromolecules is a key mechanism driving the formation of membraneless organelles (MLOs) within cells, playing a crucial role in fundamental biological processes such as cell proliferation and stress response. Accurately understanding and predicting the phase separation propensity of proteins is essential for unraveling the assembly mechanisms of MLOs and their functions under both physiological and pathological conditions. Traditional research methods primarily rely on biochemical experiments, which are limited by low throughput, high cost, and difficulty in systematically exploring sequence-phase transition relationships. This study proposes and implements a novel three-stage, iterative paradigm based on artificial intelligence (AI) to propel phase separation research towards systematization, predictability, and mechanistic understanding. O_LIBenchmark Model Construction: A preliminary predictive model was established based on a Multilayer Perceptron (MLP) neural network, and the driving effect of phenylalanine/tyrosine (F/Y) residue-mediated {pi}-{pi} interactions on LLPS was validated. C_LIO_LIModel Robustness Enhancement: The model was optimized through adversarial training strategies, which effectively identified and eliminated misclassifications of "highly disordered non-phase-separating" trap sequences. This significantly improved the models generalization capability and reliability when handling complex, real-world sequences. C_LIO_LIPhysical Mechanism Integration and Functional Expansion: Incorporating the Uniform Manifold Approximation and Projection (UMAP) manifold learning method and constraints from non-equilibrium thermodynamics, a "fingerprint space" capable of characterizing the thermodynamic behavior of phase separation was constructed. This space enables cluster analysis of different MLO types, and the model can output a thermodynamic stability score for protein phase separation. Based on this score, we identified 10 high-confidence candidate proteins with the potential to form novel MLOs. The paradigm established in this study upgrades phase separation prediction from the traditional "binary classification" approach to a novel research framework characterized by "physical mechanism analysis + novel MLO discovery." It provides the phase separation field with a computational tool that combines high accuracy, strong robustness, and good physical interpretability. C_LI

19
Integrated Analysis of HeberFERON-Driven Comparative Proteomic regulation in Glioblastoma Cells U-87MG

Vazquez-Blomquist, D.; Besada, V.; Miranda, J.; Ramos, Y.; Palomares, C. S.; Guirola, O.; Bringas, R.; Vonasek, E.; Gil, Y.; Perez, W.; Diaz, T.; Quinones-Vega, M.; Gonzalez, L. J.; Bello-Rivero, I.

2026-04-24 cancer biology 10.64898/2026.04.22.720155 medRxiv
Top 0.5%
0.7%
Show abstract

Glioblastoma is a very aggressive brain tumor with few therapeutics options. Type I and II Interferons (IFNs) co-formulation HeberFERON has been used in cancer treatment, with promising results in high grade brain tumors. High throughput techniques in easy-to-handle models have been important to interrogate biomolecules changes, describe mechanisms and find pharmacodynamic biomarkers. This study aims to elucidate the effect of HeberFERON over the cell proteome in comparison to its individual IFNs components. Proteomic changes with HeberFERON in the glioblastoma-derived cell line U-87MG, in comparison with individual IFN-2b and IFN-{gamma}, were studied using a nanoLC instrument EasyLC coupled to Velos Pro mass spectrometer; Maxquant and Perseus were also used. Several enrichment tools, networking analysis and canSAR for drug targets were employed. Translation, RNA processing, mitotic cell cycle, cytoskeleton and chromosome organization, apoptosis, autophagy, DNA repair are enriched to limit cellular growing together with changes in immune response components, supporting HeberFERON as a multitarget treatment. This co-formulation is distinguished at modulating RNA splicing with SMN complex, cytoskeleton organization and microtubule-based movement, nuclear envelope breakdown, DNA conformational changes, and oxidative phosphorylation, with a better drawing of effects over a variety of systems inside the tumoral cell. Together with previous microarray experiment, informative genes and proteins as pharmacodynamic biomarkers for antiproliferative effects showed up (ex. STAT1/2, CENPE, ATRIP, MAP1B, LIMA1, VCP, several ribosomal, spliceosome and proteasomal complexes proteins). This study complements transcriptomic and phosphoproteomic previous experiments in this model and underscore HeberFERON as a glioblastoma therapeutic.

20
Exploring the Mechanism of Na⁺/K⁺-ATPase (NKA) and 20-HETE Ligand Interactions by in-silico modeling

Faleel, D.; Arnest, R.; Aradhyula, V.; Boyapalli, S.; Haller, S. T.; Kennedy, D. J.

2026-05-15 bioinformatics 10.64898/2026.05.12.724327 medRxiv
Top 0.5%
0.7%
Show abstract

The Na+/K+-ATPase (NKA) regulates ion balance in the kidney and influences cellular processes like proliferation and apoptosis through its signal transduction. The endogenous ligand 20-Hydroxyeicosatetraenoic acid (20-HETE) contributes to inflammation and fibrosis in chronic kidney disease (CKD) and inhibits NKA activity in renal tubules. However, the molecular mechanism of this interaction remains unclear. In this study, we used in-silico approach to investigate the potential interaction between 20-HETE and NKA. Various ligands, including known NKA ligands such as cardiotonic steroids (CTS), 20-HETE, and negative controls, were docked using rigid and Induced Fit Docking to predict the affinity of the ligands toward NKA. Binding free energy calculations with the Prime Molecular mechanics with generalized Born and surface area (Prime MM/GBSA) tools were used to confirm the involvement of key amino acids in ligand-receptor interactions. The docking analyses revealed that 20-HETE exhibited a binding affinity comparable to negative control, with some differences between rigid and induced fit docking. Binding free energy data highlighted key amino acids in the 20-HETE and NKA interaction. Interaction fingerprint and mutations such as Ala330Gly and Val329Ala significantly reduced binding free energy, while Thr804Ala showed a notable decrease, underscoring the potential importance of these amino acids in ligand stabilization. These findings provide computational evidence supporting potential direct interaction between 20-HETE and NKA and identify candidate residues for future experimental validation.